Goto

Collaborating Authors

 mturk worker




Incorporating Worker Perspectives into MTurk Annotation Practices for NLP

Huang, Olivia, Fleisig, Eve, Klein, Dan

arXiv.org Artificial Intelligence

Current practices regarding data collection for natural language processing on Amazon Mechanical Turk (MTurk) often rely on a combination of studies on data quality and heuristics shared among NLP researchers. However, without considering the perspectives of MTurk workers, these approaches are susceptible to issues regarding workers' rights and poor response quality. We conducted a critical literature review and a survey of MTurk workers aimed at addressing open questions regarding best practices for fair payment, worker privacy, data quality, and considering worker incentives. We found that worker preferences are often at odds with received wisdom among NLP researchers. Surveyed workers preferred reliable, reasonable payments over uncertain, very high payments; reported frequently lying on demographic questions; and expressed frustration at having work rejected with no explanation. We also found that workers view some quality control methods, such as requiring minimum response times or Master's qualifications, as biased and largely ineffective. Based on the survey results, we provide recommendations on how future NLP studies may better account for MTurk workers' experiences in order to respect workers' rights and improve data quality.


Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization

Zhang, Lining, Mille, Simon, Hou, Yufang, Deutsch, Daniel, Clark, Elizabeth, Liu, Yixin, Mahamood, Saad, Gehrmann, Sebastian, Clinciu, Miruna, Chandu, Khyathi, Sedoc, João

arXiv.org Artificial Intelligence

To prevent the costly and inefficient use of resources on low-quality annotations, we want a method for creating a pool of dependable annotators who can effectively complete difficult tasks, such as evaluating automatic summarization. Thus, we investigate the recruitment of high-quality Amazon Mechanical Turk workers via a two-step pipeline. We show that we can successfully filter out subpar workers before they carry out the evaluations and obtain high-agreement annotations with similar constraints on resources. Although our workers demonstrate a strong consensus among themselves and CloudResearch workers, their alignment with expert judgments on a subset of the data is not as expected and needs further training in correctness. This paper still serves as a best practice for the recruitment of qualified annotators in other challenging annotation tasks.


Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering

Sen, Priyanka, Aji, Alham Fikri, Saffari, Amir

arXiv.org Artificial Intelligence

We introduce Mintaka, a complex, natural, and multilingual dataset designed for experimenting with end-to-end question-answering models. Mintaka is composed of 20,000 question-answer pairs collected in English, annotated with Wikidata entities, and translated into Arabic, French, German, Hindi, Italian, Japanese, Portuguese, and Spanish for a total of 180,000 samples. Mintaka includes 8 types of complex questions, including superlative, intersection, and multi-hop questions, which were naturally elicited from crowd workers. We run baselines over Mintaka, the best of which achieves 38% hits@1 in English and 31% hits@1 multilingually, showing that existing models have room for improvement. We release Mintaka at https://github.com/amazon-research/mintaka.


Inside the 1TB ImageNet data set used to train the world's AI: Nude kids, drunken frat parties, porno stars, and more

#artificialintelligence

Special report ImageNet – a data set used to train AI systems around the world – contains photos of naked children, families on the beach, college parties, porn actresses, and more, scraped from the web to train computers without those individuals' explicit consent. The library consists of 14 million images, each placed into categories that describe what's pictured in each scene. This pairing of information – images and labels – is used to teach artificially intelligent applications to recognize things and people caught on camera. The database has been downloaded by boffins, engineers, and academics to train hundreds if not thousands of neural networks to identify stuff in photos – from assault rifles and aprons to magpies and minibuses to zebras and zucchinis, and everything in between. In 2012, the data set was used to build AlexNet, heralded as a breakthrough development in deep learning since it marked the first time a neural network outperformed traditional computational methods at object recognition in terms of accuracy.


A Study on Agreement in PICO Span Annotations

Lee, Grace E., Sun, Aixin

arXiv.org Artificial Intelligence

In evidence-based medicine, relevance of medical literature is determined by predefined relevance conditions. The conditions are defined based on PICO elements, namely, Patient, Intervention, Comparator, and Outcome. Hence, PICO annotations in medical literature are essential for automatic relevant document filtering. However, defining boundaries of text spans for PICO elements is not straightforward. In this paper, we study the agreement of PICO annotations made by multiple human annotators, including both experts and non-experts. Agreements are estimated by a standard span agreement (i.e., matching both labels and boundaries of text spans), and two types of relaxed span agreement (i.e., matching labels without guaranteeing matching boundaries of spans). Based on the analysis, we report two observations: (i) Boundaries of PICO span annotations by individual human annotators are very diverse. (ii) Despite the disagreement in span boundaries, general areas of the span annotations are broadly agreed by annotators. Our results suggest that applying a standard agreement alone may undermine the agreement of PICO spans, and adopting both a standard and a relaxed agreements is more suitable for PICO span evaluation.


Use Amazon Mechanical Turk with Amazon SageMaker for supervised learning Amazon Web Services

#artificialintelligence

Supervised learning needs labels, or annotations, that tell the algorithm what the right answers are in the training phases of your project. In fact, many of the examples of using MXNet, TensorFlow, and PyTorch start with annotated data sets you can use to explore the various features of those frameworks. Unfortunately, when you move from the examples to application, it's much less common to have a fully annotated set of data at your fingertips. This tutorial will show you how you can use Amazon Mechanical Turk (MTurk) from within your Amazon SageMaker notebook to get annotations for your data set and use them for training. TensorFlow provides an example of using an Estimator to classify irises using a neural network classifier.


How YouTube Uses Mechanical Turk Tasks to Help Train Its AI

WIRED

It's no secret that YouTube has struggled to moderate the videos on its platform over the past year. The company has faced repeated scandals over its inability to rid itself of inappropriate and disturbing content, including some videos aimed at children. Often missing from the discussion over YouTube's shortcomings, though, are the employees directly tasked with removing things like porn and graphic violence, as well as the contractors that help train AI to learn to detect unwelcome uploads. But a Mechanical Turk task shared with WIRED appears to provide a glimpse into what training one of YouTube's machine learning tools looks like at the ground level. MTurk is an Amazon-owned marketplace where corporations and academic researchers pay individual contractors to perform micro-sized services--called Human Intelligence Tasks--in exchange for a small sum, usually less than a dollar.